1 Basic ggplot syntax

The ggplot2 package operates by building a plot in layers, corresponding to different aspects of a plot. This modular approach makes it simple to flexibly modify aspects of a plot, but with a steeper learning curve for starting out.

First, let’s load the package and some sample data.

library(ggplot2)
data(diamonds)

Now, let’s create a simple ggplot object and define several layers. I recommend using the standard ggplot function rather than qplot to take advantage of the modular properties of using ggplot.

p <- ggplot(data = diamonds, 
            mapping = aes(x = carat, 
                          y = price, 
                          color = cut))
summary(p)
## data: carat, cut, color, clarity, depth, table, price, x, y, z
##   [53940x10]
## mapping:  x = carat, y = price, colour = cut
## faceting: facet_null()

Our object p now includes the data, and a mapping that describes the x-variable, the y-variable, and the variable that will specify the color.

2 Some sample plot types

p + geom_point()

p + geom_line()

p + geom_boxplot()
## Warning: position_dodge requires non-overlapping x intervals

p + stat_bin2d()

p + geom_hex()

3 Modifying attributes

We can of course modify various aspects of the plots.

p + geom_point() + 
    xlab("Size (carat)") + 
    ylab("Price (USD?)") +
    coord_cartesian(xlim = c(0, 3), ylim = c(0, 10000))

p + geom_point() + 
        coord_flip() + 
        scale_color_manual(values = rainbow(5))

p + geom_point() + 
        facet_grid(clarity ~ .) + 
        theme(panel.grid.major = element_blank(),
              panel.grid.minor = element_blank(),
              axis.text = element_text(color = "black"), 
              panel.background = element_rect(color = "black", fill = NA))

4 More complex plots

The modular nature also makes it straightforward to combine multiple types of plots or to add features. For example, there are several built-in functions to do common summarizations:

p + geom_point() + geom_smooth(method = "loess")

Of course, it is also possible to plot your own analyses:

demo_model <- glm(price ~ carat * cut, data = diamonds)
sample_df <- diamonds[sample(1:NROW(diamonds), 1000), c("cut", "carat")]
sample_df$price <- predict(demo_model, sample_df)
p + geom_point() + geom_line(data = sample_df, size = 2)